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Abstract 

. . . We consider recent works on tlie simulation of quantum circuits using the formahsm 

of matrix product states and the formalism of contracting tensor networks. We provide 

■ simplified direct proofs of many of these results, extending an explicit class of efficiently 
\ simulable circuits (log depth circuits with 2-qubit gates of limited range) to the follow- 
ing: let C be any poly sized quantum circuit (generally of poly depth too) on n qubits 

\ comprising 1- and 2- qubit gates and 1-qubit measurements (with 2-qubit gates acting 

on arbitrary pairs of qubit lines). For each qubit line i let Di be the number of 2-qubit 
gates that touch or cross the line i i.e. the number of 2-qubit gates that are applied 

■ to qubits j,k with j < i < k. Let D = max^Di. Then the quantum process can be 
classically simulated in time npoly(2^). Thus if D = O(logn) then C may be efficiently 
classically simulated. 



> 
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2 ■ 1 Introduction 

I The issue of finding nontrivial families of quantum circuits that can be classically efficiently 

simulated is a fundamentally important one for the understanding of quantum computa- 
tional power and for the design of new quantum algorithms. It is known ^ that the absence 
I of increasing multi-partite entanglement in a quantum algorithm is sufficient to guarantee 

' an efficient simulation and that the question of efficient simulability is closely related to 

^ ■ the variety of different possible mathematical formalisms for the representation of quantum 

states and operations. To further study the origins quantum computational speedup it is 
thus natural to try to identify classes of circuits that can generate entanglement yet which 
^ ■ can also be classically efficiently simulated. Interesting such families have been recently 

identified by Markov and Shi jSj (using the notion of tree- width of a graph and the formal- 
ism of contracting tensor networks) and by Yoran and Short |5] (using matrix product state 
representations and the one-way quantum computer formalism). The purpose of this note 
is to present alternative derivations of many of their results with proofs that are simpler 
and considerably more direct. We also extend the identified families of efficiently simulable 
circuits (although our extension may well be also amenable to analysis by the techniques of 
[H] too) . We refer to the introduction of [Hj for a comprehensive summary of existing results 
on the simulation of quantum circuits, which we do not duplicate here. 

We begin with a statement of our main result. 

Definition 1 Let C be any poly sized quantum circuit on n qubits comprising 1- and 2- qubit 
gates. The reduced form Cred of C is constructed as follows. First we multiply together all 
1-qubit gates that lie consecutively along any single line and then multiply the result into the 
following or preceding 2-qubit gate. This eliminates all 1-qubit gates from C. Next for every 
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pair of lines i,j we consider all collections of consecutive 2-quhit gates acting on lines i,j 
that can be performed in sequence without any other interposed gate. Each such collection 
is replaced by the single 2-qubit gate given by the product. The resulting circuit Cred is the 
reduced form of C . 

An example of a circuit reduction is shown in figure 1. 



A circuit C The reduced form Cred 

Figure 1: Single black dots denote 1-qubit gates and a dot pair connected by a vertical line denotes 
a 2-qubit gate acting on the designated lines. 

It is clear that the passage from the gate description of any poly sized circuit C to its 
reduced form Cred can be calculated in poly time (as the passage involves the multiplication 
of at most poly many matrices of sizes 2 by 2 or 4 by 4). Also C and Cred clearly represent 
the same overall transformation. Hence C can be classically efficiently simulated iff Cred 
can. Our main result is (two elementary proofs of) the following. 

Theorem 1 Let C be any poly sized (and generally poly depth) quantum circuit on n qubits 
comprising 1- and 2- qubit gates (with 2-qubit gates acting on arbitrary pairs of qubit lines). 
Let the input be any product state of the n qubits and let the output be the result of a 
measurement in the standard basis on any prescribed subset of qubits after the application 
ofC. Let Cred be the reduced form ofC . For each qubit line i in Cred ^ei Di be the number of 
2-qubit gates that touch or cross the line i i.e. the number of 2-qubit gates that are applied 
to qubits j, k with j < i < k. Let D = maxj D^. Then the output of the quantum process can 
be classically simulated in time npoly(2-^). Thus if D = O(logn) then C may be classically 
efficiently simulated. 

We will give two direct elementary proofs of this theorem, the first using matrix product 
states (MPS) and the second using just the simple process of multiplying gates ("contracting 
tensors"). Our MPS proof will extend the results of 0. In that paper the one way quantum 
computer formalism (IWQC) is combined with MPS to show (amongst other results) that 
log depth circuits involving 2-qubit gates of restricted range only, can be classically efficiently 
simulated. Our result improves on this in several ways: we use only MPS without any 
recourse to IWQC in our proof and our theorem, clearly including circuits of the above 
type, includes many further circuit families of fully poly depth and involving 2-qubit gates 
of unrestricted range. 

We note that theorem ^ is closely similar to proposition 5.1 of j2| where the proof in- 
volves recourse to the labyrinthine theory of tree decompositions and tree width of graphs 
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and associated contraction orderings. In contrast our second proof is transparently ele- 
mentary and our theorem appears to cover essentially all the explicitly mentioned families 
of efficiently simulable circuits in Thus it would be interesting to display a family of 
circuits that can be seen to be efficiently simulable from the sophisticated formalism of tree 
decompositions of graphs etc. but which is not amenable to the elementary methods that 
we present below. 

We note that our theorem may also be readily extended to the case where C allows 
1-qubit measurements within the body of the circuit, with choices of later gates and mea- 
surements depending adaptively on earlier measurement outcomes. (In that case we apply 
the definition of Di to C rather than Cred)- But for clarity of exposition we state our the- 
orem in the simpler form above. Also all results and definitions in this paper generalise 
immediately from qubits to qudits for any fixed dimension d but again for clarity we work 
with qubits {d = 2) throughout. 

2 Matrix product states 

A product state of n qubits has the form 

IV')l...n = l«)ll/3)2---l'^)n (1) 

which manifestly depends on a number of parameters that grows only linearly with n (in 
contrast to exponentially, for general states). This restriction on the size of the state 
description is significant since it allows an efficient classical simulation of quantum processes 
involving such states. In the standard basis \ip) may be written as ^ Ci-^...i„ \h) ■ ■ ■ \in) but 
now we have an exponentially large description (the set of amplitudes) which is restricted 
by further conditions viz. that Cij^...i„ = ctn^j2 • • • ^in fo^' some Oj, . . . , fcj's. 

A matrix product state (MPS) of n qubits is a natural generalisation of the form eq. 

: we simply replace each state \a) , . . . by a matrix of (generally sub-normalised) states 
\aij) . . . and form a corresponding matrix product of the n matrices: 

1^) = ^ \aij)\(3jk)\-fki) ■ ■ -iKmi) ■ (2) 

i,j,k,l,...m, 

The sizes of the matrices may be freely chosen subject only to the compatibility requirement 
for the formation of matrix products. Note that the first and last indices {i in eq. ©) are 
also summed. If we write the matrices \aij) as A etc. then \^/J) = tr {ABC ... K). The 
special case of eq. is recovered if all the matrices are 1 by 1. If we write all states in 
components in the standard basis: 

il in 

then we get the more cumbersome expression 

i,...,m,ii,...,i„ 
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which now involves matrices of complex numbers rather than quantum states. 

Remark on PEPS: the definition of MPS requires a choice of one-dimensional ordering 
of the qubit subsystems. However there is an alternative description of MPS - the so-called 
"projected entangled pairs state" description (PEPS) [H 1^1 with the useful feature that it 
generalises in a natural way to two and higher dimensional arrays of subsystems. Briefly it 
works as follows: consider the MPS of n qubits in eq. with each matrix having size at 
most L by L. We start with a sequence of n maximally entangled pairs of L-level systems 
in states |A) = Yli=o I*) 1^) ai^anged in a line as depicted in figure 2. 



^jkj "' — ^ 

site n site 1 site 2 site n — 1 site n 

Figure 2: Each line connecting a pair of stars represents the L-level maximally entangled state |A). 
Each site comprises two L-level systems and we apply a linear projection from Lx L dimensions to 2 
dimensions at each site, resulting in a state of n qubits. This PEPS is identified as having the MPS 
form with matrices in eq. Q given directly by the matrices of the linear projection operations. 

Consider the linear maps from L x L to 2 dimensions: 

P^ = AY^\^^){^j\, P, = Bl] {i^) , Pn = K^J 

applied at sites 1, 2, . . . , n respectively resulting in an n qubit state called a PEPS. Since 
I A) has Kronecker delta components |A) = -^^ij^ij b) we immediately see that the 
resulting PEPS of n qubits is precisely the MPS in eq. Q. 

To generalise to 2 (or higher) dimensional arrays of sites we begin instead with a 2 
dimensional grid of the entangled |A) states. Now each site in the body of the grid has 4 
-L-level systems (or 3 or 2 at the edges or corners respectively, of the 2-dimensional grid) and 
we can consider similar site projections from each whole site into a 2-dimensional subspace 
of the site. If L is restricted to stay suitably small (e.g. grow only polynomially with the 
number of sites) then the resulting multi-qubit PEPS will again depend on only poly many 
parameters. This formalism was introduced and exploited in |5j to provide ground breaking 
new techniques in the study of 2 and 3 dimensional strongly correlated quantum systems in 
condensed matter physics. It is also noteworthy that the Raussendorf-Briegel cluster state 
(in 1 or 2 or higher dimensional array configurations) underlying the one-way quantum 
computation model is very special in having the simplest possible PEPS description: the 
entangled pairs always have minimal possible dimension L = 2 and the projections are 
all the same, viz. P = \ i) {ii . . . i\ identifying at each site the qubit subspace spanned 
by the two kets of "all O's and all I's" (cf [HI or [S] for a more detailed description). For 
applications to quantum circuits in this note we will consider only multi-qubit states in a 
one-dimensional ordering. □ 

It is not hard to see (cf. below for one method) that any state {ip) of n qubits can be 
expressed in the form eq. Q but generally requiring matrices of exponential size in n. Here 
we will be interested in those states expressible using matrices of restricted sizes. If each 
matrix has size L by L then the state 1-0) will depend on O(nL^) parameters. Hence if 
we restrict L to be poly(n) we will obtain a family of states (generalising product states) 
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that depend on only poly-many parameters, hence allowing efficient classical simulation of 
their processing if we are given suitable methods of updating the MPS description after 
application of gates and measurements. 

This rather abstract restriction of requiring limited matrix sizes may be usefully related 
to more familiar constructs such as Schmidt rank of bipartite divisions of {ip) and log- 
depthness of poly sized quantum circuits (cf. below). 

To see that any state {ip) of n qubits may be expressed in the MPS form we describe 
an explicit construction using an iterated Schmidt decomposition, introduced by Vidal [Jj. 
The qubits are always taken to be labelled 1, 2, . . . , n in linear order from left to right. We 
will use the following elementary facts about Schmidt decompositions. Let \(p)j^^ be any 
bipartite state with Schmidt rank r and Schmidt form 

r 
i=l 

Then: 

Fact 1: if \(,)^ is any state that is orthogonal to all the |6j)'s then \'r])j^ = (^|<^) = 0. 
Fact 2: if = J2i l^i) some |Ci)'s then = a/A^ \ai) for all i. 
Fact 3: the Schmidt rank r is equal to the dimension of the support of the reduced state 
of A in \(p). 

To get an MPS form for we begin by writing it in its Schmidt form for the partition 
l|2...n: 

IV') = 5^|ai)l I0)2...n- 

j 

Here we have absorbed the Schmidt coefficients into the LH set: \aj) are orthogonal and 
subnormalised, and are orthonormal. The range of the index j is the Schmidt rank of 
[ip) for this partition. Next let \r]k)^ „ be the orthonormal Schmidt basis of 1-0) for the RH 
part of the partition 12|3 . . .n. Then for each j we have 

k 

where \bk) are again generally subnormalised (and non-orthogonal). To see that cannot 
have a component outside the span of the |r/fc)'s let |7;) be a set of orthonormal vectors 
extending \r]k) to a full basis of the 3 . . . n system, so in complete generality we can write 

\^j)23...n = Efc 1^)2 \m)3...n + \cji)2 h)3...n- Then {-fl\^P) = \aj) \cji) which must be 
zero by fact 1. Since \aj) are orthogonal this implies \cji) = for all jl. 

Thus we have = J2jk l^jfc)2 l%)3 n- Continuing in this way we get 

IV') = l«i)ll^jfc)2|cfc/)3---l^p)- (4) 

j,k,l,...,p 

This is of the MPS form eq. Q: the size of the i^^ matrix is s by t where s (resp. t) is the 
Schmidt rank of {ip) for the partition 1 . . . {s — l)\s . . . n (resp. 1 . . . s\{s + 1) . . . n) . We also 
have further special conditions always satisfied by this particular MPS construction: 
(i) the first (resp. last) matrix has only one row (resp. one column); 
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(ii) the last matrix is an orthonormal set of states; 

(iii) if we consider any partition 1 . . . z| (i + 1) . . . n and sum the respective parts first: 




'(i+i) 



Vln 



E 



A 



m/l,,,i l^m.)(i+i)...n 



then is the orthonormal Schmidt basis for the system (i + 1) . . . n and is the 

orthogonal Schmidt set for the system 1 . . . i, subnormalised to the corresponding Schmidt 
coefficients. (To see this we just halt the iterative process leading to eq. (jlj) at stage i, 
showing that is the orthonormal Schmidt basis for the system (i + l) . . .n and then 

use fact 2). 

3 Simulating computations using MPS's 

For any n qubit state {ip) let Xi) be the maximal Schmidt rank of any partition 1 . . + 
1) . . . n of the linearly ordered qubits into a left and right part. 

Vidal[7j and Yoran and Shortly have shown the following. 



Lemma 1 |^ // a single qubit unitary gate is applied to any qubit of 
description eq. ^ can be updated in O(x^) computational steps. 



then the MPS 



Lemma 2 ^ If a 2-qubit unitary gate is applied to any adjacent qubits (numbered i,i + \) 
of then the MPS description eq. ^ can be updated in O(x^) computational steps. 

Remark: A 2-qubit gate U applied to non-adjacent qubits on lines / and / -|- r can be 
replaced by (2r — 1) adjacent gates viz. (r — 1) swaps on adjacent qubits to make lines / and 
I + r adjacent, U on adjacent qubits and (r — 1) further adjacent swaps to return the lines 
to their original positions, as shown in figure 3. Hence the whole process can be simulated 
with 0(xi) + . • • + 0(x2r-i) computational cost where Xi is X of the z*^ state in this process. 
We will see later (cf lemma H] and theorem^ that maxj is in fact ©(xV") (where IV') is 
the initial state to which U was applied) so the cost is 0{rx^). In a quantum circuit on n 
qubits we have r = 0{n) so the total cost will be poly(n) if x-ip = poly('T')- D 



line /: 



line / -|- r: 



equals 



Figure 3: A non-adjacent 2-qubit gate acting on lines r apart is replaced by 2r — 1 adjacent gates. 
The double headed arrows denote swap gates. In this replacement each qubit line is affected by at 
most 4 gates. 
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Lemma 3 ^21 If a single qubit measurement (in any chosen basis) is made on a qubit of\^) 
then the outcome probabilities and MPS description (as in eq. ^) of any post-measurement 
state can be calculated in po\y{xtl;) computational steps. 

Vidal [Zj concluded the following: consider any pure state poly time quantum compu- 
tation with input size n. If the state at every stage has x bounded by poly(n) then the 
quantum computation can be efficiently classically simulated. But he did not relate his 
X-condition to any prospective structural property of a quantum circuit. 

Yoran and Short W noted further that the IWQC cluster state based on a 2 dimen- 
sional grid of size M x N has an MPS description of the form eq.Q) with x ^ 2™™^^'^'^-'. 
Consequently any IWQC process (defined by a sequence of at most MN adaptive 1-qubit 
measurements on the M x N cluster state) can be simulated classically in MA^poly(x) time 
where x = 2™™^*^'^). They concluded the following result: letC be any quantum gate array 
on n qubits comprising 1- and 2-qubit gates such that (a) C has log depth and (b) the range 
of each 2-qubit gate is bounded by a constant r i.e. each 2-qubit gate acts on a pair of qubits 
at most r lines apart. Then the computation can be classically efficiently simulated. 

Their proof proceeds by noting that any such computation can be translated by standard 
methods into the IWQC formalism using a cluster state of size M x N where min(M, A^) = 
O(logn) (because the circuits are log depth) and also max(M, A^) = poly(n). Hence each 
state in the IWQC process will have x = poly(?T-) and lemma 01 gives the result. 

In the translation into the IWQC formalism, each 2-qubit gate acting on lines r apart 
requires a piece of cluster state of size 0{r) x 0{r). If r is constant, the total cluster state 
for the whole (log depth) circuit will thus have a log sized minimum dimension, but if r is 
even allowed to be O(logn) large, then the resulting required cluster may have minimum 
dimension 0((logn)^) and hence the simulation by the method of |5] will now require 
poly(20(('°s")') = poly(ni°s") time classically. 

We now introduce a further lemma about the Schmidt MPS form eq. Q). This leads 
to a more direct proof of the above result for quantum circuits satisfying (a) and (b) 
without recourse to the IWQC model or cluster states. Indeed any such circuit clearly 
has D = O(logn). At the same time, by proving our theorem^ we will extend the class of 
quantum circuits that can be classically efficiently simulated. 

Lemma 4 Let \'4>)i „ have Schmidt rank r for the partition A\B = \ . . .i\{i + \) . . .n. 
(i) Let be obtained from by applying a 2-qubit gate U to two qubits numbered k,l of 
Let r' be the A\B Schmidt rank of {ip') ■ If k,l are both in A or both in B then r' = r. 
If one of k,l is in A and the other in B then r' < Ar. 

(a) Let {ip') be obtained from \tp) by application of a 1-qubit gate. Then r' = r. 

(Hi) Let IV'') be any post-measurement state resulting from a 1-qubit measurement on \^) . 

Then r' < r. 

Remark: the bound r' < 4r in (i) is tight. Let |(/>'^) = "^(|00) -|- |11)) and take |V')i234 — 
\4'^)i2 l'?^^)34 with partition A\B = 12|34. Consider the 2-qubit gate U of swap on 2,3. 
Then {ip) has r = 1 but after U we have r' = 4. (For qudits the tight upper bound is a 
multiplicative factor of d^). 
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Proof of lemma HJ (i) If k, I are both on the same side of the partition then clearly r' = r. 
Thus suppose k, I lie on the two sides. Without loss of generality we may assume that k = i 
and I = i + 1 since swap operations within A and B do not change Schmidt rank. Let 
1^) — Sfc=i \^k) A\^k) B Schmidt form. For each k let ct^ be the reduced state of 

qubit i + 1 in and let \sk) , \tk) be its two eigenstates. Hence in U \xp) the reduced state 
of ^ U {i + 1} is spanned by the 2r states U (|afc) \sk)), U(\ak) \tk)) for /c = 1, . . . , r. Tracing 
out qubit i + 1 from each of these (generally entangled) states gives a rank 2 state of A for 
each k, so the reduced state of A in U IV') is supported on dimension at most 2(2r). Hence 
r' < 4r follows by fact 3. 

(ii) is clear, (iii) follows by noting that any post-measurement state is obtained by 
applying a suitable projection to 1-0). Hence the support of any reduced state cannot 
increase and fact 3 gives the result. □ 

Proof of theorem ^ using MPS formalism: Let C be any poly sized quantum circuit 
on n qubits, of the kind in the statement of the theorem. Replace each 2-qubit gate U of 
Cred that acts on non-adjacent qubits r lines apart, by r — 1 adjacent swap gates, U on 
adjacent qubits, and r — 1 further adjacent swap gates to restore the line positions. It is 
clear from figure 3 that this sequence of 2r — 1 2-qubit gates (each now acting on adjacent 
qubits) touches any given qubit line at most 4 times (and crosses no qubit lines because of 
adjacency). Thus as a result of this replacement the D value D' of the circuit is increased 
by at most a factor of 4 {D' < 4D) and now all 2-qubit gates act on adjacent qubits. The 
starting state has Xtpo ~ ^- Consider simulating the circuit operations in order using lemmas 
I1I2I3I Let \^pk) be the state at any stage of the process. For any partition 1 . . . + 1) . . . n, 
iV'fc) will, by lemma|3 have a Schmidt rank of at most where D'^ is the number of 2-qubit 
gates acting on lines i,i + 1. Hence the maximal Schmidt rank Xmax of any state k in the 
process, across any partition, is at most = 4*^^^). Lemmas I1I2I3I then show that each 
step can be classically simulated in poly(4^ ) = poly(4-^) classical computational steps and 
the whole process in time Tpoly(4-^) where T is the total number of gates in Cred- Note 
that for any circuit C on n qubits, if D is prescribed then C can have at most nD/2 2-qubit 
gates as each such gate touches or crosses at least 2 lines. Hence the full simulation time is 
?ipoly(4'^) = npoly(2'^), and if D = O(logn) then the simulation is efficient. □ 

Remark: the D = O(logn) condition in theorem^does not imply an efficient simulation of 
a// poly sized log depth quantum circuits. Recall that a general poly sized log depth quantum 
circuit on n qubits is one for which the gates can be transversally partitioned into O(logn) 
layers of gates such that all gates in each layer can be done simultaneously in parallel. Thus 
a layer may contain 0(n) gates so D could be 0(n) and hence not efficiently simulable by the 
method in the proof of theorem^ (For example the circuit could have 0{n) gates applied to 
qubits (1, § + 1), (2, § + 2), . . . , (^, n) in a single layer and then -D„/2 would be 0{n)). Indeed 
in jUj it was shown that if circuits of even constant depth 3 (followed by a measurement 
layer) are efficiently simulable then all quantum computation would be efficiently simulable 
(i.e. then BQP=BPP). They also showed by an elementary argument that any circuit of 
depth 2 (followed by a measurement layer) is efficiently simulable. From the notions in 
theorem ^ this can be seen as follows: firstly we can reorder the qubit lines so that all 
2-qubit gates in layer 1 act on adjacent qubits i.e. on line pairs (1,2), (3,4), . . . , (n — l,n). 
Then the gates in layer 2 may still have 0{n) range but it is straightforward to see that the 
line pairs can now be reordered (thus preserving adjacency of layer 1 gates) so that layer 2 
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gates have range at most 4, and the efficient simulabihty then follows from theorem ^ □ 
Remark: The condition D = O(logn) in theorem ^ (for efficient simulability) does not 
require that the circuit be of log depth. For example a "ladder circuit" of 0{n) 2-qubit 
gates applied in order to qubits (1, 2), (2, 3), . . . , (n — 1, n), has D = 2 and it is poly sized 
with poly depth too. This circuit can be efficiently simulated by the method in the proof of 
theorem n but not by the method of ref as its IWQC translation requires a cluster state 
of poly by poly size in 2 dimensions. □ 

Corollary 1 (This reproduces a result from JBl)- Consider any IWQC process on a 2 
dimensional cluster state of size M = poly(n) by N = O(logn). Then the process can be 
simulated in poly(n) classical time. 

Proof: Using the linear labelling used in (2j of qubits in a cluster state of size M x N with 
N = O(logn), it is clear that this cluster state can be manufactured by a poly sized circuit 
with D = O(logn) and then by theorem Q (or more precisely, a straightforward extension 
allowing adaptive gates and measurements) the subsequent measurement sequence can be 
simulated in poly(n) time too (as 1-qubit gates or measurements do not change the value 
of LI). □ 

4 Contracting linear networks 

We now give a completely different proof of theorem ^ based on the idea of "contracting 
tensor networks". This really just amounts to multiplying out the matrices corresponding 
to gates in a circuit and noting conditions under which this calculation (with matrices of 
potentially exponentially growing size) can remain only poly sized. This subject was recently 
introduced into the study of quantum circuits by Markov and Shi |3] . Our treatment here 
has the advantage of being much simpler but may lack the full generality of their results. 

In the above proof of theorem ^ we made much use of unitarity and orthogonality, for 
example in the very concept of a Schmidt decomposition and unitarity preserving orthog- 
onality in updating the Schmidt MPS description. Thus it is surprising to note that the 
tensor network approach below, based on general linear algebraic properties only, makes no 
use of unitarity at all and remains valid for arbitrary linear gates! 

Proof of theorem fusing linear network formalism: Let C be any circuit of the kind 
in theorem ^ with D = maxDj. Consider first the case that the output of the circuit is a 
single 1-qubit measurement on any single qubit line, without loss of generality (wlog) the 
first line. 

Also assume wlog that C has been reduced. Since C is poly-sized this can be effected in 
poly(n) time and the circuit now comprises only 2-qubit gates. 

Furthermore assume wlog that all 2-qubit gates act on adjacent qubit lines - using the 
construction of figure 3 this can be arranged subject only to at most a constant factor 4 
increase in the value of D. 

Suppose also wlog, that the input state for the circuit is the n-qubit state |0) . . . |0). 
(Any other product input state can then be manufactured first using only 1-qubit gates). 
Let bi denote the components of the vector |0) in the standard basis. 
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On the circuit diagram of C, for each qubit line we place an index label on each segment 
between the occurrence of two 2-qubit gates and we label the beginning of each line with 
the input state b. This is illustrated in figure 4 for a simple typical circuit. 
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Figure 4: Index labels for a simple illustrative circuit of four 2-qubit gates U, V, W, X. On the first 
line wc use i's, on the second line, j's etc. for the index names. The number of indices on any line 

is 0{D). 

Each 2-qubit gate now has 2 input indices and 2 output indices. We write inputs 
as subscripts and outputs as superscripts so for example in figure 4, V has components 
^hki ■ Summing over common indices corresponds to composition of the gates in the circuit. 
For example in figure 4, the 4-qubit output state A has components labelled by indices 
^3)J4)^3)^2 and is given by the contracted expression (easily read off from figure 4): 

A -0 tr u^^^^ v-^^^ w^^j^ j^i^^^ . 

Here all the repeated indices are understood as being summed (i.e. contracted: the RHS 
has an implied Ennk,hi2j2k2j3)- 

Next suppose we want to compute prob(A:), the probability of obtaining fc = or A; = 1 
from a standard basis measurement on line 1. Let 11^ be the matrix of the projector |A;)(fc|. 
Then for the example of figure 4, prob(A;) is given by the number obtained from the full 
contraction of all indices in figure 5: 
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Figure 5: The fully contracted linear network whose value is prob(A:). Here 11 is the 1-qubit operation 
|fc)(fc| and dagger denotes the adjoint matrix. This is the circuit of figure 3 extended with a reflected 
adjoint copy and 11 inserted in the centre on line 1. 

For a general C (with D = maxj Dj) we see that the corresponding construction (obtained 
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by extending C with a reflected adjoint copy and inserting 11 in the centre on hne 1) has at 
most 2D + 1 = 0{D) gates affecting any fuh single line. It now follows that the number 
prob(/c) can be readily explicitly computed in npoly(2'^*^^)) steps. To do this we consider 
the full tensor expression (as depicted in figure 5 for our illustrative example) and sum 
all indices on line 1 e.g. all i-indices in figure 5. There are 0{D) i-indices each taking 
values and 1 i.e. a total of 20(-°) terms in the sum for each set of 0{D) j-index values. 
These j-indices have 2^'^^^ sets of values too so the total computational effort to do all the 
corresponding i-sums, for all j-values, is 2'^(^)poly(2'^(^)) = poly(2'^(^)). 

Having summed the i-indices we are left with a single object with 0{D) j-indices and 
further 2-qubit gates with indices j,k, . . . , I. 

Next we sum out all the j-indices. This again requires 2^^^^ sums (for all the A;-index 
values) each of which is a sum over 2^^^'> terms (i.e. the sum over all j-index values). 
Continuing in this way, if there are n qubit lines the final number prob(A;) is computed in 
npoly(2'-^(^)) steps. If Z) = O(logn) this whole computation is poly time. 

In this way we can compute prob(A;) for /c = 0, 1 and then sample the resulting distri- 
bution. This completes the simulation of the quantum qubit measurement outcome. 

Next suppose the output is not just a single measurement but the measurement of a 
subset {a, 6, . . . , e} of the qubit lines (a subset whose size could be 0{n)). To simulate this, 
we first compute as above, the distribution for line a only and sample the distribution to get 
an outcome ka say. Then we place the matrix Ii{ka) / \l prob(/ca) on line a and calculate the 
distribution for line h given that line o has value fca, by repeating the above procedure. Then 
we sample the resulting prob(A;;)|A:a) distribution to get a value k\j for line h. Continuing in 
this way we sample the whole required joint distribution on lines a,b, . . . ,e. li D = 0(log n) 
this whole simulation is clearly still poly time as the size of the set of lines is at most 0(n). 

Finally consider the more general type of circuit C in which 1-qubit measurements can 
be performed in the body of the circuit and choice of later gates may depend on previous 
measurement outcomes. To simulate this situation we consider the circuit C only up to 
its first measurement and simulate the output distribution as above. After sampling it we 
place the matrix of the corresponding projector, divided by its square-root probability, on 
its line and fix the identity of any later gates that depended on this outcome. Continuing in 
this way with each subsequent measurement in order of occurrence we simulate the whole 
circuit C. Since C has at most poly(n) such measurements this entire simulation will be 
poly time if D = O(logn). 

Acknowledgements 

This work was supported in part by the UK's EPSRC-QIPIRC network and by the European 
Commission under the Integrated Project Qubit Applications (QAP) funded by the 1ST 
directorate as Contract Number 015848. 

References 

[1] Jozsa, R. and Linden, N. Proc. Roy. Soc. Lond. A 459, 2011-2032 (2003). 
iarXiv:quant-ph/0201143, 



11 



[2] Yoran, N. and Short, A. |arXiv:quant-ph/060ir78l (2006) 



[3] Markov, I. and Shi, Y. |arXiv:quant-ph/0511069| (2005) 



[4] Verstraete, F., Porras, D. and Cirac, I. Phys. Rev. Lett. 93, 227205 (2004). 
|arXiv:cond-mat /04047061 (2004) 

[5] Verstraete, F. and Cirac, I. |arXiv:cond-mat/04 07066' (2004) 

[6] Vers traete, F. and Cirac, I. Phys. Rev. A 70, 060302(R) (2004). 
|arXiv:quant-ph/03ril30| 

Vidal, G. Phys. Rev. Lett. 91, 147902 (2003). arXiv:quant-ph/0301063l 
Jozsa, R. |arXiv:quant-ph /0508 1 24] (2005) 



[9] Terhal, B. and DiVincenzo, D. Quant. Inf. Comput. 4(2): 134-145 (2004). 



iarXiv:quant-ph/0205133 



12 



